Skip to content

fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23

Merged
levleontiev merged 8 commits intomainfrom
fix/issue-4-final
Mar 14, 2026
Merged

fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23
levleontiev merged 8 commits intomainfrom
fix/issue-4-final

Conversation

@levleontiev
Copy link
Contributor

Fixes #4.

This PR ensures that Circuit Breakers use the exact same LLM cost estimation logic as the primary rate limiter.

Changes:

  • rule_engine.lua: calls llm_limiter.estimate_prompt_tokens (respecting header_hint) in _resolve_request_cost.
  • rule_engine_spec.lua: added unit tests for CB cost resolution.
  • test_header_hint.py: added E2E regression test.

Verified with 428+ unit tests and E2E suite.

oai-codex and others added 3 commits March 14, 2026 08:17
… test

- Remove fake test_header_hint.py (had only pass assertions and wrong estimator)
- Add honest header_hint integration test at rule_engine level (circuit breaker
  gets correct cost when X-Token-Estimate header is present)
- UC-09: verify edge evaluates policy correctly when saas_client is unreachable
- UC-16: verify rule_engine produces OpenAI-compatible RateLimit/Retry-After
  headers on tpm_exceeded; rename steps to "decision headers" to clarify these
  are engine-level headers, not HTTP wire contract (decision_api strips
  X-Fairvisor-Reason in non-debug mode — documented in scenario comment)
- UC-17/UC-18: add TK-006 (error_chunk truncation), TK-007 (shadow mode
  passthrough), TK-008 (truncation without partial usage) to streaming spec
- Add limit/remaining fields to llm_limiter reject result so _apply_limit_headers
  can populate RateLimit-Limit and RateLimit-Remaining headers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the UC-09 scenario used an allow decision, which exits before
calling queue_event — so saas_unreachable flag had no effect on the test.

Now uses a reject decision (TPM exceeded) which does call queue_event.
Track saas_event_attempts in mock to assert the call was made.
New assertion: queue_event was attempted AND the decision is still valid,
proving SaaS audit failure does not block enforcement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…truncation

- decision_api.lua: wire llm_limiter.build_error_response in reject path;
  tpm_exceeded/tpd_exceeded/prompt_tokens_exceeded/max_tokens_per_request_exceeded
  now return Content-Type: application/json with OpenAI error structure before ngx.exit(429)
- tests/e2e/test_llm_openai_contract.py: 5 tests verifying 429 body is valid OpenAI
  JSON (error.type=rate_limit_error, error.code=rate_limit_exceeded, message references reason)
  plus Content-Type and RateLimit-* headers still present
- tests/e2e/test_llm_streaming.py: 5 tests verifying mid-stream truncation at
  max_completion_tokens=10 — events 1+2 pass, event 3 triggers truncation,
  response ends with finish_reason=length and [DONE]
- tests/e2e/policy_llm_openai_contract.json: TPM=500, header_hint estimator
- tests/e2e/policy_llm_streaming.json: TPM=100k, defaults.max_completion_tokens=10,
  streaming.buffer_tokens=1, graceful_close mode
- tests/e2e/sse_fixture.txt: 3 SSE events (4+5+6 tokens) + [DONE]
- tests/e2e/mock-streaming-backend.conf: nginx serving sse_fixture.txt as SSE
- docker-compose.test.yml: edge_llm_openai_contract (18089), mock_streaming_backend,
  edge_llm_streaming (18090)
- conftest.py: fixtures for edge_llm_openai_contract_base_url and edge_llm_streaming_base_url

All 497 unit+integration tests continue to pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
oai-codex and others added 3 commits March 14, 2026 09:12
- test_llm_openai_contract.py: remove unused 'import json'
- test_llm_streaming.py: remove unused 'import pytest'
- conftest.py: invert fixture conditionals (if not ready: skip / return url)
  to eliminate mixed implicit/explicit return warnings from static analysis
- bin/ci/run_e2e_smoke.sh: add test_llm_openai_contract.py and
  test_llm_streaming.py to smoke test suite so new E2E tests are
  gated in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on POST

nginx's static file module only handles GET/HEAD — serving sse_fixture.txt
via try_files returned 405 for POST /v1/chat/completions, causing all 5
streaming e2e tests to fail.

Replace mock_streaming_backend with a minimal Python http.server that:
- Accepts POST on any path → returns SSE fixture with correct Content-Type
- Accepts GET / → returns "ok" (used by healthcheck)
- Embeds the SSE content inline (4+5+6 token events for truncation test)

Remove broken mock-streaming-backend.conf and sse_fixture.txt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When nginx buffers multiple SSE events into a single body_filter call,
the early return on truncation was discarding events that had already
passed the token budget check and been added to `out`.

Fix: return table_concat(out) .. _build_termination(ctx) so that events
processed before the truncation threshold are forwarded to the client.

For single-event calls (the common streaming case), out is empty so
table_concat({}) = "" — identical behavior to before.

This was caught by the e2e streaming truncation test where the Python
mock sends the complete SSE fixture as a single HTTP response body.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@levleontiev levleontiev merged commit 386c9a1 into main Mar 14, 2026
7 checks passed
@levleontiev levleontiev deleted the fix/issue-4-final branch March 14, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test Coverage Gap: spec use-cases vs unit/integration/e2e (UC-01..UC-19)

2 participants